Delete by query API |
您所在的位置:网站首页 › delete entry › Delete by query API |
Examplesedit
Delete all documents from the my-index-000001 data stream or index: resp = client.delete_by_query( index="my-index-000001", conflicts="proceed", body={"query": {"match_all": {}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', conflicts: 'proceed', body: { query: { match_all: {} } } ) puts response POST my-index-000001/_delete_by_query?conflicts=proceed { "query": { "match_all": {} } }Delete documents from multiple data streams or indices: resp = client.delete_by_query( index=["my-index-000001", "my-index-000002"], body={"query": {"match_all": {}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001,my-index-000002', body: { query: { match_all: {} } } ) puts response POST /my-index-000001,my-index-000002/_delete_by_query { "query": { "match_all": {} } }Limit the delete by query operation to shards that a particular routing value: resp = client.delete_by_query( index="my-index-000001", routing="1", body={"query": {"range": {"age": {"gte": 10}}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', routing: 1, body: { query: { range: { age: { gte: 10 } } } } ) puts response POST my-index-000001/_delete_by_query?routing=1 { "query": { "range" : { "age" : { "gte" : 10 } } } }By default _delete_by_query uses scroll batches of 1000. You can change the batch size with the scroll_size URL parameter: resp = client.delete_by_query( index="my-index-000001", scroll_size="5000", body={"query": {"term": {"user.id": "kimchy"}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', scroll_size: 5000, body: { query: { term: { 'user.id' => 'kimchy' } } } ) puts response POST my-index-000001/_delete_by_query?scroll_size=5000 { "query": { "term": { "user.id": "kimchy" } } }Delete a document using a unique attribute: resp = client.delete_by_query( index="my-index-000001", body={"query": {"term": {"user.id": "kimchy"}}, "max_docs": 1}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', body: { query: { term: { 'user.id' => 'kimchy' } }, max_docs: 1 } ) puts response POST my-index-000001/_delete_by_query { "query": { "term": { "user.id": "kimchy" } }, "max_docs": 1 } Slice manuallyeditSlice a delete by query manually by providing a slice id and total number of slices: resp = client.delete_by_query( index="my-index-000001", body={ "slice": {"id": 0, "max": 2}, "query": {"range": {"http.response.bytes": {"lt": 2000000}}}, }, ) print(resp) resp = client.delete_by_query( index="my-index-000001", body={ "slice": {"id": 1, "max": 2}, "query": {"range": {"http.response.bytes": {"lt": 2000000}}}, }, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', body: { slice: { id: 0, max: 2 }, query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response response = client.delete_by_query( index: 'my-index-000001', body: { slice: { id: 1, max: 2 }, query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_delete_by_query { "slice": { "id": 0, "max": 2 }, "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } } POST my-index-000001/_delete_by_query { "slice": { "id": 1, "max": 2 }, "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }Which you can verify works with: resp = client.indices.refresh() print(resp) resp = client.search( index="my-index-000001", size="0", filter_path="hits.total", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.indices.refresh puts response response = client.search( index: 'my-index-000001', size: 0, filter_path: 'hits.total', body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response GET _refresh POST my-index-000001/_search?size=0&filter_path=hits.total { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }Which results in a sensible total like this one: { "hits": { "total" : { "value": 0, "relation": "eq" } } } Use automatic slicingeditYou can also let delete-by-query automatically parallelize using sliced scroll to slice on _id. Use slices to specify the number of slices to use: resp = client.delete_by_query( index="my-index-000001", refresh=True, slices="5", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', refresh: true, slices: 5, body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_delete_by_query?refresh&slices=5 { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }Which you also can verify works with: resp = client.search( index="my-index-000001", size="0", filter_path="hits.total", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.search( index: 'my-index-000001', size: 0, filter_path: 'hits.total', body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_search?size=0&filter_path=hits.total { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }Which results in a sensible total like this one: { "hits": { "total" : { "value": 0, "relation": "eq" } } }Setting slices to auto will let Elasticsearch choose the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards. Adding slices to _delete_by_query just automates the manual process used in the section above, creating sub-requests which means it has some quirks: You can see these requests in the Tasks APIs. These sub-requests are "child" tasks of the task for the request with slices. Fetching the status of the task for the request with slices only contains the status of completed slices. These sub-requests are individually addressable for things like cancellation and rethrottling. Rethrottling the request with slices will rethrottle the unfinished sub-request proportionally. Canceling the request with slices will cancel each sub-request. Due to the nature of slices each sub-request won’t get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. Parameters like requests_per_second and max_docs on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that using max_docs with slices might not result in exactly max_docs documents being deleted. Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time. Change throttling for a requesteditThe value of requests_per_second can be changed on a running delete by query using the _rethrottle API. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts. $params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->deleteByQueryRethrottle($params); resp = client.delete_by_query_rethrottle( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", requests_per_second="-1", ) print(resp) response = client.delete_by_query_rethrottle( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619', requests_per_second: -1 ) puts response res, err := es.DeleteByQueryRethrottle( "r1A2WoRbTwKZ516z6NEs5A:36619", esapi.IntPtr(-1), ) fmt.Println(res, err) const response = await client.deleteByQueryRethrottle({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619', requests_per_second: '-1' }) console.log(response) POST _delete_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1Use the tasks API to get the task ID. Set requests_per_second to any positive decimal value or -1 to disable throttling. Get the status of a delete by query operationeditUse the tasks API to get the status of a delete by query operation: $response = $client->tasks()->list(); resp = client.tasks.list( detailed="true", actions="*/delete/byquery", ) print(resp) response = client.tasks.list( detailed: true, actions: '*/delete/byquery' ) puts response res, err := es.Tasks.List( es.Tasks.List.WithActions("*/delete/byquery"), es.Tasks.List.WithDetailed(true), ) fmt.Println(res, err) const response = await client.tasks.list({ detailed: 'true', actions: '*/delete/byquery' }) console.log(response) GET _tasks?detailed=true&actions=*/delete/byqueryThe response looks like: { "nodes" : { "r1A2WoRbTwKZ516z6NEs5A" : { "name" : "r1A2WoR", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1:9300", "attributes" : { "testattr" : "test", "portsfile" : "true" }, "tasks" : { "r1A2WoRbTwKZ516z6NEs5A:36619" : { "node" : "r1A2WoRbTwKZ516z6NEs5A", "id" : 36619, "type" : "transport", "action" : "indices:data/write/delete/byquery", "status" : { "total" : 6154, "updated" : 0, "created" : 0, "deleted" : 3500, "batches" : 36, "version_conflicts" : 0, "noops" : 0, "retries": 0, "throttled_millis": 0 }, "description" : "" } } } } }This object contains the actual status. It is just like the response JSON with the important addition of the total field. total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field. With the task id you can look up the task directly: $params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->tasks()->get($params); resp = client.tasks.get( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", ) print(resp) response = client.tasks.get( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' ) puts response res, err := es.Tasks.Get( "r1A2WoRbTwKZ516z6NEs5A:36619", ) fmt.Println(res, err) const response = await client.tasks.get({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' }) console.log(response) GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619The advantage of this API is that it integrates with wait_for_completion=false to transparently return the status of completed tasks. If the task is completed and wait_for_completion=false was set on it then it’ll come back with results or an error field. The cost of this feature is the document that wait_for_completion=false creates at .tasks/task/${taskId}. It is up to you to delete that document. Cancel a delete by query operationeditAny delete by query can be canceled using the task cancel API: $params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->tasks()->cancel($params); resp = client.tasks.cancel( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", ) print(resp) response = client.tasks.cancel( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' ) puts response res, err := es.Tasks.Cancel( es.Tasks.Cancel.WithTaskID("r1A2WoRbTwKZ516z6NEs5A:36619"), ) fmt.Println(res, err) const response = await client.tasks.cancel({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' }) console.log(response) POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancelThe task ID can be found using the tasks API. Cancellation should happen quickly but might take a few seconds. The task status API above will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself. |
今日新闻 |
点击排行 |
|
推荐新闻 |
图片新闻 |
|
专题文章 |
CopyRight 2018-2019 实验室设备网 版权所有 win10的实时保护怎么永久关闭 |